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Abstract — Given a dictionary that consists of multiple blocks 
and a signal that lives in the range space of only a few blocks, 
we study the problem of finding a block-sparse representation of 
the signal, i.e., a representation that uses the minimum number 
of blocks. Motivated by signal/image processing and computer 
vision applications, such as face recognition, we consider the 
block-sparse recovery problem in the case where the number 
of atoms in each block is arbitrary, possibly much larger than 
the dimension of the underlying subspace. To find a block- 
sparse representation of a signal, we propose two classes of 
non-convex optimization programs, which aim to minimize the 
number of nonzero coefficient blocks and the number of nonzero 
reconstructed vectors from the blocks, respectively. Since both 
classes of problems are NP-hard, we propose convex relaxations 
and derive conditions under which each class of the convex 
programs is equivalent to the original non-convex formulation. 
Our conditions depend on the notions of mutual and cumu- 
lative subspace coherence of a dictionary, which are natural 
generalizations of existing notions of mutual and cumulative 
coherence. We evaluate the performance of the proposed convex 
programs through simulations as well as real experiments on face 
recognition. We show that treating the face recognition problem 
as a block-sparse recovery problem improves the state-of-the-art 
results by 10% with only 25% of the training data. 

Index Terms — Block-sparse signals, convex optimization, sub- 
spaces, principal angles, face recognition. 

I. Introduction 
A. Recovery of Sparse Signals 

Sparse signal recovery has drawn increasing attention in 
many areas such as signal/image processing, computer vision, 
machine learning, and bioinformatics (see e.g., fTl, 121, O, IH 
and the references therein). The key assumption behind sparse 
signal recovery is that an observed signal y can be written as 
a linear combination of a few atoms of a given dictionary B. 

More formally, consider an underdetermined system of 
linear equations of the form y = Be, where B G R^^^ 
has more columns than rows, hence allowing infinitely many 
solutions for a given y. Sparsity of the desired solution arises 
in many problems and can be used to restrict the set of possible 
solutions. In principle, the problem of finding the sparsest 
representation of a given signal can be cast as the following 
optimization program 

P^o : min||c||o s.t. y = Be, (1) 

where ||c||o is the quasi-norm of c, which counts the number 
of nonzero elements of c. We say that a vector c is /c-sparse 
if it has at most k nonzero elements. While finding the sparse 
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representation of a given signal using P^^ is NP-hard 111, 
the pioneering work of Donoho | 6 | and Candes | 7 | showed 
that, under appropriate conditions, this problem can be solved 
efficiently as 

Pi^ : min||c||i s.t. y = Be. (2) 

Since then, there has been an outburst of research articles 
addressing conditions under which the two optimization pro- 
grams, Pi^ and P^Q, are equivalent. Most of these results are 
based on the notions of mutual/cumulative coherence |8l, (g) 
and restricted isometry property |7|, 1 10], which we describe 
next. Throughout the paper, we assume that the columns of B 
have unit Euclidean norm. 

Mutual/Cumulative Coherence. The mutual coherence of a 
dictionary B is defined as 

/i = max 1676^1, (3) 

where bi denotes the i-th column of B of unit Euclidean norm. 
1 8 1 and | 9 | show that if the sufficient condition 

{2k - < 1, (4) 

holds, then the optimization programs P^^ and P^^ are equiva- 
lent and recover the /c-sparse representation of a given signal. 
While /i can be easily computed, it does not characterize 
a dictionary very well since it measures the most extreme 
correlations in the dictionary. 

To better characterize a dictionary, cumulative coherence 
measures the maximum total coherence between a fixed atom 
and a collection of k other atoms. Specifically, the cumulative 
coherence associated to a positive integer /c | 8 1 is defined as 

Cfc = rnax max V | bj bj \ , (5) 

where denotes a set of k different indices from {1, . . . , A/"}. 
Note that for /c = 1, we have Ci = Although cumulative 
coherence is, in general, more difficult to compute than mutual 
coherence, it provides sharper results for the equivalence of 
P£^ and . In particular, 1 8 1 shows that if 

a + a-i < 1, (6) 

then the optimization programs P^^ and P^q are equivalent and 
recover the /c-sparse representation of a given signal. 

Restricted Isometry Property. An alternative sufficient 
condition for the equivalence between P^^ and P^q is based 
on the so-called restricted isometry property (RIP) |3, ifTOl . 
For a positive integer k, the restricted isometry constant of a 
dictionary B is defined as the smallest constant 6^ for which 

{l-5k)\\c\\l<\\Bc\\l<{l + S,)\\c\\l (7) 



2 



mi = 100 m2 = 100 ms = m4 = ■ ■ ■ = mioo = 1 

I I I I I I I 1^ 

nil = 50 m2 = 50 ms = 50 mgg = 50 mioo = 50 

Fig. 1. Top: a block-sparse vector is not necessarily sparse. In this example, 
2 nonzero blocks out of 100 blocks correspond to 200 nonzero elements out 
of 298 elements. Bottom: a sparse vector is not necessarily block-sparse. In 
this example, all 100 blocks are nonzero each having one nonzero element. 
However, this gives rise to only 50 nonzero elements out of 5,000 elements. 

holds for all /c-sparse vectors c. flO] shows that if S2k < V^ — 
1, then and P^q are equivalent. The bound in this result has 
been further improved and ifTTl shows that if S2k < 0.4652, 
then and P^q are equivalent. 

B. Recovery of Block-Sparse Signals 

Recently, there has been growing interest in recovering 
sparse representations of signals in a union of a large number 
of subspaces, under the assumption that the signals live in 
the direct sum of only a few subspaces. Such a representation 
whose nonzero elements appear in a few blocks is called a 
block-sparse representation. Block sparsity arises in various 
applications such as reconstructing multi-band signals |[T2ll . 
(131, measuring gene expression levels |4|, face/digit/speech 
recognition 1 14 L 1 15] , L16 L 1 17 L clustering of data on multiple 
subspaces |[T8l . |[T9l , ll2Qll . 1211 . finding exemplars in datasets 
1221 . multiple measurement vectors |i23j|, 1241, 1221, ||26|, etc. 

The recovery of block-sparse signals involves solving a 
system of linear equations of the form 

y = Bc=[B[l] ••• B[n]]c, (8) 

where B consists of n blocks B[i] G R^^^\ The main 
difference with respect to classical sparse recovery is that the 
desired solution of ([5]) corresponds to a few nonzero blocks 
rather than a few nonzero elements of B. We say that a vector 
= [c[l]^ ••• c[n]^] is k-block- sparse, if at most k 
blocks c[i] G M^* are different from zero. Note that, in 
general, a block-sparse vector is not necessarily sparse and 
vice versa, as shown in Figure [T] 

The problem of finding a representation of a signal y that 
uses the minimum number of blocks of B can be cast as the 
following optimization program 

n 

: min^/(||c[i]||,) s.t. y = Be, (9) 

i=l 

where g > and /(•) is the indicator function, which is zero 
when its argument is zero and is one otherwise. In fact, the 
objective function in ^ counts the number of nonzero blocks 
of a solution. However, solving ^ is an NP-hard problem as 
it requires searching exhaustively over all choices of a few 
blocks of B and checking whether they span the observed 
signal. The li relaxation of P^^/^q has the following form 

n 

P^j^^ : min ^ \\c\i] ||, s. t. y = Be. (10) 

i=l 



For g > 1, the optimization program Pi^/i^ is convex and can 
be solved efficiently using convex programming tools [|27 J . 

Remark 1: For q = 1, the convex program P^^/^^ is the 
same as P^^ in ^ used for sparse recovery. In other words, 
while the ii optimization program, under some conditions, can 
recover a sparse representation of a signal, it can also recover 
a block- sparse representation, under appropriate conditions, as 
we will discuss in this paper. 

The works of 1281 , 1291 , 1301 study conditions under which 
for the special case of g = 2, Pi^/e^ and Pi^/io ^re equivalent. 
These conditions are based on generalizations of mutual 
coherence and restricted isometry constant, as described next. 

Block- Coherence. The work of \2^\ assumes that the blocks 
have linearly independent columns and are of the same length 
d, i.e., T8ink{B[i]) = rrii = d. Under these assumptions, 1291 
defines the block-coherence of a dictionary B as 

^B = ms.x]a,{B[i]~'B[j]), (11) 

where cri(-) denotes the largest singular value of the given 
matrix. Also, the subcoherence of B is defined sls v = max^ fii 
where /i^ denotes the mutual coherence for the i-th block. 1291 
shows that if 

{2k-l)d/j.B < 1 - {d-l)i^, (12) 

then Pi^/ii and Pi^/io equivalent and recover the /c -block- 
sparse representation of a given signal. 

Block-RIP. |30| assumes that the blocks have linearly inde- 
pendent columns, although their lengths need not be equal. 
Under this assumption, | 30| defines the block restricted isom- 
etry constant of B as the smallest constant Ss^k such that 

(1 - SB,k)\\cg < \\Bcg < (1 + SB,k)\\c\\l (13) 

holds for every /c-block-sparse vector c. Analogous to the con- 
ventional sparse recovery results, 1301 shows that if 6b, 2k < 
\/2 — 1, then Pi^/ii ^£2/^0 equivalent. 

The work of 1 28 1 proposes an alternative analysis framework 
for block- sparse recovery using Pi^/ii in th^ special case 
of Gaussian dictionaries. By analyzing the nullspace of the 
dictionary, it shows that if the blocks have linearly independent 
columns, perfect recovery is achieved with high probability as 
the length of the signal, D, grows to infinity. 

An alternative approach to recover the block-sparse repre- 
sentation of a given signal is to solve the optimization program 

n 

min^/(||B[i]c[i]||,) s.t. y = Be, (14) 

i=l 

for g > 0. Notice that the solution to this problem coincides 
with that of P^^/^q blocks with linearly independent 
columns since ||^[i]c[z] ||q > if and only if ||c[z]||g > 0. 
Nevertheless, P^'^/^^ is an NP-hard problem. In the case of 
> 1, the following £i relaxation 

n 

^V^i ■ "^i^Ell^W^WII? s-*- y = Bc, (15) 
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is a convex program and can be solved efficiently. The work 
of II3TII studies conditions under which, for the special case 
of g = 2, P^^/i^ and P^^/i^ are equivalent. The conditions 
are based on the notion of mutual subspace incoherence, as 
described next. 



Mutual Subspace Incoherence. The work of | 31 1 introduces 
the notion of mutual subspace incoherence of B, which is 
defined as 



max max 



X 2 ^ 2 



(16) 



where Si = span(S[i]). Under the assumption that the blocks 
have linearly independent columns and the subspaces spanned 
by each block are disjoint, 13T1 shows that P^^ 
are equivalent if 

{2k - l)fis < 1. (17) 



As mentioned above, the state-of-the-art block-sparse recov- 
ery methods EHl, |[23, |[30l. Ell, |^ consider dictionaries 
whose blocks consist of linearly independent vectors which 
we refer to as non-redundant blocks. However, in signal/image 
processing, machine learning, and computer vision problems 
such as face recognition |[T4ll , |[T5l and motion segmentation 
ifTSl . 1331 . blocks of a dictionary consist of data points and 
often the number of data in each block exceeds the dimension 
of the underlying subspace. For example, in automatic face 
recognition, the number of training images in each block of 
the dictionary is often more than the dimension of the face 
subspace, known to be 9 under a fixed pose and varying illumi- 
nation [34 1 . One motivation for this is the fact that having more 
data in each block better captures the underlying distribution 
of the data in each subspace and, as expected, increases the 
performance of tasks such as classification. However, to the 
best of our knowledge, existing theoretical results have not 
addressed recovery in dictionaries whose blocks have linearly 
dependent atoms, which we refer to as redundant blocks. 
Moreover, theoretical analysis for the equivalence between 



P^J^i and P^j^^ 



as well as the equivalence between Pi 



and Pp IP has been restricted to only q = 2. Nevertheless, 



empirical studies in some applications 1351 . have shown better 
block- sparse recovery performance for q^2. Therefore, there 
is a need for analyzing the performance of each class of the 
convex programs for arbitrary q>l. 

C. Paper Contributions 

In this paper, we consider the problem of block-sparse 
recovery using the two classes of convex programs P^^/i^ 
and P'l^ji^ for g > 1. Unlike the state of the art, we do 
not restrict the blocks of a dictionary to have linearly inde- 
pendent columns. Instead, we allow for both non-redundant 
and redundant blocks. In addition, we do not impose any 
restriction on the lengths of the blocks, such as requiring them 
to have the same length, and allow arbitrary and different 
lengths for the blocks. To characterize the relation between 
blocks of a dictionary, we introduce the notions of mutual 
and cumulative subspace coherence, which can be thought 
of as natural extensions of mutual and cumulative coherence 



from one-dimensional to multi-dimensional subspaces. Based 
on these notions, we derive conditions under which the convex 
programs Pi^/i^ and Pp ip_^ are equivalent to P^^in^ and 
P'p ip^, respectively. While the mutual subspace coherence is 
easier to compute, cumulative subspace coherence provides 
weaker conditions for block-sparse recovery using either of the 
convex programs. Thanks to our analysis framework and the 
introduced notions of subspace coherence, our block-sparse 
recovery conditions are weaker than the conditions of the 
state of the art who have studied the special case of g = 2. 
To the best of our knowledge, our work is the first one to 
analyze both non-redundant and redundant blocks, while our 
theoretical framework does not separate the two cases and 
analyzes both within a unified framework. 

We evaluate the performance of the proposed convex pro- 
grams on synthetic data and in the problem of face recognition. 
Our results show that treating the face recognition as a 
block-sparse recovery problem can significantly improve the 
recognition performance. In fact, we show that the convex pro- 
gram Pp ip^ outperforms the state-of-the-art face recognition 
methods on a real-world dataset. 

Paper Organization. The paper is organized as follows. 
In Section |II| we introduce some notations and notions that 
characterize the relation between blocks and the relation 
among atoms within each block of a dictionary. In Section 
III we investigate conditions under which we can uniquely 
determine a block-sparse representation of a signal. In Sections 



IV 



and[vj we consider the convex programs Pn n and P'p . 
respectively, and study conditions under which they recover a 
block-sparse representation of a signal in the case of both non- 
redundant and redundant blocks. In Section |VlJ we discuss the 
connection between our results and the problem of correcting 



sparse outlying entries in the observed signal. In Section VII 



we evaluate the performance of the two classes of convex 
programs through a number of synthetic experiments as well 



as the real problem of face recognition. Finally, Section VIII 
concludes the paper. 

II. Problem Setting 

We consider the problem of block-sparse recovery in a union 
of subspaces. We assume that the dictionary B consists of 
n blocks and the vectors in each block B^ G R^^^^ live 
in a linear subspace Si of dimension di. Unlike the state-of- 
the-art block-sparse recovery literature, we do not restrict the 
blocks to have linearly independent columns. Instead, we allow 
for both non-redundant (m^ = di) and redundant (m^ > di) 
blocks. For reasons that will become clear in the subsequent 
sections, throughout the paper, we assume that the subspaces 
{^iliLi spanned by the columns of the blocks {^[^]}^=i are 
disjoint. 

Definition 1: A collection of subspaces {Si}^^^ is called 
disjoint if each pair of different subspaces intersect only at 
the origin. 

In order to characterize a dictionary B, we introduce 
two notions that characterize the relationship between the 
blocks and among the atoms of each block of the dictionary. 
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We start by introducing notions that capture the inter-block 
relationships of a dictionary. To do so, we make use of the 
subspaces associated with the blocks. 

Definition 2: The sub space coherence between two disjoint 
subspaces Si and Sj is defined as 



54 S., 



li{Si^Sj) = max 



xeSi,zeSj I x| 2||z| 2 



e[o,i). (18) 



The mutual subspace coherence |31|, /i^, is defined as the 
largest subspace coherence among all pairs of subspaces, 



(19) 



Notice from Definition [T] that two disjoint subspaces intersect 
only at the origin. Therefore, their subspace coherence is 
always smaller than one[^ The following result shows how to 
compute the subspace coherence efficiently from the singular 
values of a matrix obtained from the subspace bases 1361 . 

Proposition 1: Let Si and Sj be two disjoint subspaces with 
orthonormal bases Ai and Aj, respectively. The subspace 
coherence ii{Si^Sj) is given by 

^{S,,Sj)=cj^{AlA,). (20) 

It follows from Definition |2] that the mutual subspace coher- 
ence can be computed as 



lis = maxcri(A7Aj). 



(21) 



Comparing this with the notion of block-coherence in ([TT]), 
the main difference is that block-coherence, jiB^ uses directly 
block matrices which are assumed to be non-redundant. How- 
ever, mutual subspace coherence, /i^, uses orthonormal bases 
of the blocks that can be either non-redundant or redundant. 
The two notions coincide with each other when the blocks are 
non-redundant and consist of orthonormal vectors. 

While the mutual subspace coherence can be easily com- 
puted, it has the shortcoming of not characterizing very well 
the collection of subspaces because it only reflects the most 
extreme correlations between subspaces. Thus, we define a 
notion that better characterizes the relationship between the 
blocks of a dictionary. 

Definition 3: Let denote a subset of k different elements 
from {l,...,n}. The k-cumulative subspace coherence is 
defined as 

Cfe — max max > ii{Si^Sj). (22) 

Afc i^Afc ^ 

Roughly speaking, the /c-cumulative subspace coherence mea- 
sures the maximum total subspace coherence between a fixed 
subspace and a collection of k other subspaces. Note that for 
= 1, we have Ci = l-^s- 

Mutual/cumulative subspace coherence can be thought of 
as natural extensions of mutual/cumulative coherence, defined 
in ^ and ([5]). In fact, they are equivalent to each other 

^Note that the smallest principal angle |36| between Si and Sj, 0(Si,Sj), 
is related to the subspace coherence by /^(Si^Sj) = cos{6{Si,Sj)). Thus, 
/i5 is the cosine of the smallest principal angle among all pairs of different 
subspaces. 




Fig. 2. Four one-dimensional subspaces in a two-dimensional space. *Si and 
*S2 are orthogonal to *S3 and *S4, respectively. 



for the case of one-dimensional subspaces, where each block 
of the dictionary consists of a single atom. The following 
Lemma shows the relationship between mutual and cumulative 
subspace coherence of a dictionary. 

Lemma 1: Consider a dictionary B, which consists of n 
blocks. For every /c < n, we have 



(23) 



The proof of Lemma [T] is straightforward and is provided in 
the Appendix. While computing ^fe is, in general, more costly 
than computing /i^, it follows from Lemma [T] that conditions 
for block-sparse recovery based on ^fc are weaker than those 
based on /i^, as we will show in the next sections. In fact, for 
a dictionary, C,k can be much smaller than kjis^ which results 
in weaker block-sparse recovery conditions based on C,k- To 
see this, consider the four one-dimensional subspaces shown 
in Figure [2j where Si and ^2 are orthogonal to ^3 and ^4, 
respectively. Also, the principal angles between Si and ^2 as 
well as ^3 and ^4 are equal to 9 < 7r/4. Hence, the ordered 
subspace coherences are < < sin(6>) < sin(6>) < cos(6>) < 
cos(6>). One can verify that 

C3 = cos((9) + sin((9) < S/i^ = 3cos(i9). 

In fact, for small values of 0, is much smaller than S/i^]^ 
Next, we introduce notions that capture the intra-block 
characteristics of a dictionary. 

Definition 4: Let g > 0. For a dictionary define the 
intra-block q-restricted isometry constant, e^, as the smallest 
constant such that for every i there exists a full column-rank 
submatrixB[z] G M^><^^ of B\i] g R^><^^ such that for every 
c\i] we have 



(1-6,)||C[Z]||^<||^[Z]CW||^<(1 + 6,)||C[Z] 



(24) 



Roughly speaking, Cq characterizes the best ^'-restricted isome- 
try property among all submatrices of B [i] that span subspace 
Si. When = 2, for a dictionary with non-redundant blocks, 
where B[i] = B[i], 62 coincides with the 1-block restricted 
isometry constant of B defined in ([13]), i.e., 62 = Sb,i- 
Thus, Cq can be thought of as a generalization of the 1-block 

^Another notion, which can be computed efficiently, is the sum of the k 
largest subspace coherences, — Mi + * * * + Mfe' where the sorted subspace 
coherences among all pairs of different subspaces are denoted by /i^ = /ii > 
^ /£3 ^ • • • • We can show that Ck ^ '^k ^ kfis- the example of 
Figure[2] U3 = 2cos(^) + sin(^), which is between ("3 and S/i^. 
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restricted isometry constant, Sb,i, to generic dictionaries with 
both non-redundant and redundant blocks and arbitrary q>l. 

Definition 5: Let q > 0. For a dictionary B, define the 
upper intra-block q-restricted isometry constant, dq, as the 
smallest constant such that for every i and c[i] we have 

||B[i]c[i]||i<(l + <Tj||c[i]||^. (25) 

While in general Cq < (jq, for the special case of non- 
redundant blocks, where B[i] = B[i], we have Cq = aq. 

Remark 2: It is important to note that the theory developed 
in this paper holds for any q > 0. However, as we will see, 
q > I leads to convex programs that can be solved efficiently. 
Hence, we focus our attention to this case. 

HI. Uniqueness of Block-Sparse Representation 

Consider a dictionary B with n blocks B[i] G R^^^^ 
generated by disjoint subspaces Si of dimensions di. Let y be 
a signal that has a block-sparse representation in B using k 
blocks indexed by {zi,...,i/e}. We can write 

k k 

y = Y,B[ii]c[ii]=J2si„ (26) 

where Si^ = B[ii]c[ii] is a vector in the subspace Si^. In 
this section, we investigate conditions under which we can 
uniquely recover the indices {ii} of the blocks/subspaces 
as well as the vectors {si^ G Si^} that generate a block- 
sparse representation of a given y. We will investigate the 
efficient recovery of such a block- sparse representation using 
the convex programs Pe^/e^ and in the subsequent 

sections. 

In general, uniqueness of {si^} is a weaker notion than 
the uniqueness of {c[z^]} since a unique set of coefficient 
blocks {c[z^]} uniquely determines the vectors {si^}, but the 
converse is not necessarily true. More precisely, given s^^, 
the equation Si^ = B[ii]c[ii] does not have a unique solution 
c[ii] when B[ii] is redundant. The solution is unique, only 
when the block is non-redundant. Therefore, the uniqueness 
conditions we present next, are more general than the state- 
of-the-art results. While |30| and |29| provide conditions for 
the uniqueness of the blocks {ii} and the coefficient blocks 
{c[z^]}, which only hold for non-redundant blocks, we provide 
conditions for the uniqueness of the blocks {ii} and the vectors 
{sii } for generic dictionaries with non-redundant or redundant 
blocks. We show the following result whose proof is provided 
in the Appendix. 

Proposition 2: Let B[i] e R^^^^ be an arbitrary full 
column-rank submatrix of B[i] G R^^^^ and define 

B^[B[1] ••• B[n]]. ill) 

The blocks {ii} and the vectors {si^} that generate a /c-block- 
sparse representation of a signal can be determined uniquely 
if and only if 5 c / for every 2/c-block-sparse vector c 7^ 0. 

Remark 3: Note that the disjointness of subspaces is a 
necessary condition for uniquely recovering the blocks that 
take part in a block-sparse representation of a signal. This 



comes from the fact that for /c = 1, the uniqueness condition 
of Proposition [2] requires that any two subspaces intersect only 
at the origin. 

Next, we state another uniqueness result that we will use 
in our theoretical analysis in the next sections. For a fixed 
r G [0, 1) and for each i G {1, . . . , n} define 

W,,, ^ {s, G 5„ 1 - r < < 1 + r}, (28) 

which is the set of all vectors in Si whose norm is bounded 
by 1 + r from above and by 1 — r from below. Let Aj. = 
{ii, . . . , ik} be a set of k indices from {1, . . . , n}. Define 

Mr{Ak) = {BA, = [s,, • • • s,,] , Si, eWr,i,A < I < k} , (29) 

which is the set of matrices B\^ G R^^^ whose columns 
are drawn from subspaces indexed by and their norms are 
bounded according to ([28]). With abuse of notation, we use 
Bf. to indicate G Mr{Aj^) whenever Aj^ is clear from 

the context. For example, Bn G R^^"^ indicates a matrix 
whose columns are drawn from all n subspaces. We have the 
following result whose proof is provided in the Appendix. 

Corollary 1: Let r G [0, 1). The blocks {ii} and the vectors 
{si, } that constitute a /c -block- sparse representation of a signal 
can be determined uniquely if and only if rank(Sn) ^ 2/c for 
every Bn G B^(An). 

Note that the result of Corollary [T] still holds if we let 
the columns of B^ have arbitrary nonzero norms, because 
the rank of a matrix does not change by scaling its columns 
with nonzero constants. However, as we will show in the next 
section, the bounds on the norms as in ( |28| ) appear when 
we analyze block-sparse recovery using the convex programs 
P^^/i^ and P'^ 1^^. While checking the condition of Corollary 
[T] is not possil)le, as it requires computing every possible Si 
in Wt-,^, we use the result of Corollary [l] in our theoretical 
analysis in the next sections. 

In the remainder of the paper, we assume that a given 
signal y has a unique /c -block- sparse representation in B. 
By uniqueness of a block-sparse representation, we mean that 
the blocks A^ and the vectors {si G Si}i^\^ for which 
y = X^iGAfc determined uniquely. Under this 

assumption, we investigate conditions under which the convex 
programs Pi^/i^ and P^ recover the unique set of nonzero 
blocks Ak and the unique vectors {si G Siji^A,^. 

IV. Block-Sparse Recovery via Pi^i-^ 

The problem of finding a block- sparse representation of a 
signal y in a dictionary B with non-redundant or redundant 
blocks can be cast as the optimization program 

n 

P^^/^, : min^/(||c[z]||<^) s.t. y = Be. 

i=l 

Since Pi^/ig directly penalizes the norm of the coefficient 
blocks, it always recovers a representation of a signal with the 
minimum number of nonzero blocks. As Pe^/iQ is NP-hard, 
for g > 1, we consider its convex relaxation 

n 

Pij^^ : mm'^\\c[{\\\q s.t. y = Be, 
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and, under a unified framework, study conditions under which 
P^q/i^ and P^g/^o equivalent for both the case of non- 
redundant and redundant blocks. 

To that end, let be a set of k indices from {!,••• , n} and 
be the set of the remaining n—k indices. Let x be a nonzero 
vector in the intersection of 0iGAfc*^i ^nd ®i^\.^Si, where 
denotes the direct sum operator. Let the minimum ^^/^i-norm 
coefficient vector when we choose only the k blocks of B 
indexed by Kk be 



argmm 



E 



\C\l\ 



s.t. x^ 



B[i]c\i], (30) 



and let the minimum ^g/£i-norm coefficient vector when we 
choose the blocks indexed by A^ be 



argmm 



E 



\c\i\ 



s.t. 



E^ 



l\C\l\. 



(31) 



The following theorem gives conditions under which the 
convex program P^^/i^ is guaranteed to successfully recover 
a /c-block-sparse representation of a given signal. 

Theorem 1: For all signals that have a unique /c-block- 
sparse representation in B, the solution of the optimization 
program Pi^/i^ is equivalent to that of P^^/Iq, if and only if 



VAfc, Vx G (©iGA,50 n (©ieA.5i), X 7^ 



E 11^* w 



C Z 



(32) 



Proof: Fix Kk and y in ©iGA^^i and let c* be the 

solution of P^g/^i- If c* has at most k nonzero blocks, then 
by the uniqueness assumption the nonzero blocks are indexed 
by Afc. For the sake of contradiction, assume that c* has more 
than k nonzero blocks, so c* is nonzero for some blocks in 



A^. Define 



X = 1/ 



E^ 

ieAfc 



(33) 



From ( [33] ) we have that x lives in the intersection of ^i^AkSi 
and 0iGA^5i. Let c* and c* be respectively the solutions of 
the optimization problems in ( [3Q| and ( [3T| ), for x. We can 
write 

x = ^B[i]c*[i]= (34) 
We also have the following inequalities 



E 

ieAk 



c t 



ii.<Eii2*wii^^E 



(35) 



where the first inequality follows from the sufficient condi- 
tion in ([32]). The second inequality follows from the second 
inequalities in ( [33] ) and ( [34] ) and the fact that c* is the optimal 
solution of ( [3T] ) for x. Using the first equalities in ( [33] ) and 
([34]), we can rewrite y as 



E 

ieAk 



B[i]{c* 



■ c h 



i]), 



which implies that c* + c* is a solution of y = Be. Finally, 
using ([35]) and the triangle inequality, we obtain 



El 

ieAk 



< 



El 

ieAk 



El 

ieAk 



c z 



< Eii^*wii^+Eii2*wii^^Eii^*wii^- (37) 

ieAk ieA^ i=l 

This contradicts the optimality of c* , since it means that c* + 
c*, which is also a solution of y = Be, has a strictly smaller 
iq /ii-norm than c*. 

(=^) We prove this using contradiction. Assume there exist 
A/c and x in the intersection of ^ieAk'^i ^ieAj^Si for 
which the condition in ([32]) does not hold, i.e.. 



c z 



< 



E 

ieAk 



c z 



(38) 



Thus, a solution of x = Be is given by c* that is not /c -block- 
sparse and has a iq/ ii-norm that is smaller than or equal to 
any /c -block- sparse solution, contradicting the equivalence of 
and P^^/^,. ■ 
The condition of Theorem [T] (and Theorem [2] in the next 
section) is closely related to the nullspace property in |25|, 
1 26 1, ||28l, (371. However, the key difference is that we do not 
require the condition of Theorem [T] to hold for all feasible vec- 
tors of ( [3Q| and ( [3T] ), denoted by c and c, respectively. Instead, 
we only require the condition of Theorem [T] to hold for the 
optimal solutions of ( [30] ) and ( [3T] ). Thus, while the nullspace 
property might be violated by some feasible vectors c and c, 
our condition can still hold for c* and c*, guaranteeing the 
equivalence of the two optimization programs. 

Notice that it is not possible to check the condition in 
( [32] ) for every Ak and for every x in the intersection of 
^ieAk'^i and 0^GA^*5i- In addition, the condition in ( [32] ) 
does not explicitly incorporate the inter-block and intra-block 
parameters of the dictionary. In what follows, we propose suffi- 
cient conditions that incorporate the inter-block and intra-block 
parameters of the dictionary and can be efficiently checked. 
We use the following Lemma whose proof is provided in the 
Appendix. 

Lemma 2: Let ^ R^^^ be a matrix whose columns are 
chosen from subspaces indexed by Aj^ and Ej^ G Ma{Af.) for 
a fixed a e [0,1). Let E^ e R^x^-^ be a matrix whose 
columns are chosen from subspaces indexed by A^ where 
the Euclidean norm of each column is less than or equal to 
VTT?. We have 



Proposition 3: For signals that have a unique A: -block- 
sparse representation in B, the solution of the optimization 
program Pi^/e^ is equivalent to that of Pe^/iQ, if 



1 



/■ , /■ . 1 - ^q 

— U + Cfe-i < — . 
-en 1 + 



(40) 



(36) Proof: Fix a set A = of k indices from 

{1, . . . , n} and let A^ = {i/c+i, . . . , in} denote the set of the 
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remaining indices. Consider a signal x in the intersection of 
^ieAk^i ®ieA-f:Si. The structure of the proof is as follows. 
We show that x can be written as x = B^ak, where for the 
solution of ( [3Q| ), we have Xl^^Afc — Also, 

we show that for the solution of ( [3T| ), one can write x = 
B^a^, where ||a^||i = EieA^ Under the sufficient 

condition of the Proposition, we show that \\ak\\i < 
implying that the condition of Theorem [T] is satisfied. 

To start, let c* be the solution of the optimization program 
in ( [3T] ). For every z G A^, define the vectors Si and the scalars 
ai as follows. If c*[z] ^ and S[i]c*[i] ^ 0, let 



(41) 



Otherwise, let Si be an arbitrary vector in Si of unit Euclidean 
norm and = 0. We can write 



x = ^ B\i]cy] =B^a^, 



(42) 



where B^ = [si^^, ••• Si^] and = [a^,^, ■■_ai^]^. 
Note that from Definition p| we have ||si||2 < y^l + cr^ for 
every i G A^. 

Let G M^>^^^ be the submatrix of B[i] associated with 
Cq according to Definition [i] Since B [i] spans the subspace 
Si, there exists c[i] such that 

B[i]c[i] = Bkak, 



E 



where Bu 



and a/c 



(43) 



ai^Y . For 



every i G Afe the vectors and the scalars are defined as 

(44) 



^ B[i]c[i] 

II _ r .1 II 



ai = CH 



whenever c[i] ^ and 5[i]c[z] ^ 0. Otherwise, we let Si be 
an arbitrary vector in Si of unit Euclidean norm and = 0. 
Clearly, Bk ^ M^^{Ak) is full column-rank using Corollary 
[ijwhen Cq G [0,1). Hence, we have ak = {bJ Bk)~^Bj:si 
and consequently, 

llafclli = \\{BjBkr'Bl^h. (45) 

Substituting y from ( |42| ) in the above equation, we obtain 

< \\{BlBk)-'BlB^,\U,,\\a^,\U. (46) 
Using Lemma [2] with a = Cq and /3 = a^, we have 



V(l + e,)(l+cT,)a 
- l-[e, + (l + e,)a-i]- ^ ^ 

Thus, if the right hand side of the above equation is strictly 
less than one, i.e., if the condition of the proposition is 
satisfied, then from ( |46| ) we have \\ak\\i < Finally, 
using the optimality of c* when we choose the blocks indexed 
by A/e, we obtain 



J2\\^c[i]\U< El|c[i]||, = |K||i<||aj||i=^r 



ieAk 



ieAk 



ieAT 



(48) 



which implies that the condition of Theorem [T] is satisfied. 
Thus, the convex program Pi^/i-^ recovers a /c-block- sparse 
representation of a given signal. ■ 

The following corollary derives stronger, but simpler to 
check, sufficient conditions for block- sparse recovery using 

Corollary 2: For signals that have a unique /c-block-sparse 
representation in B, the solution of the optimization program 
Pi^/i^ is equivalent to that of Pe^/i^, if [j 



(49) 



Proof: The result follows from Proposition [3] by using the 
fact that (k < kjj^s from Lemma [T] ■ 
For non-redundant blocks, we have aq = Cq. Thus, in this 
case, for the convex program Pi^/i^ , the block-sparse recovery 
condition based on the mutual subspace coherence in ( |49| ) 
reduces to 



{2k - l)^s < 



1 



1 



(50) 



Also, for non-redundant blocks with q = 2, €2 coincides with 
the 1 -block restricted isometry constant Sb,i defined in ([13]). In 
this case, note that our result in ( [5Q| is different from the result 
of 1 29 1 stated in ([12]). First, the notion of mutual subspace 
coherence is different from the notion of block coherence 
because they are defined as the largest singular values of two 
different matrices. Second, the bound on the right hand side of 
( [50] ) is a function of the best intra-block g'-restricted isometry 
constant of B, while the right hand side of ([12]) is a function 
of the maximum mutual-coherence over all blocks of B. 

For non-redundant blocks, the block- sparse recovery con- 
dition based on the cumulative subspace coherence in ( |40j ) 
reduces to 



Ck + Ck-1 < 



1 



(51) 



which is always weaker than the condition based on the mutual 
subspace coherence in ( [50] ). In addition, when q = 2, ( [5T] ) 
provides a weaker condition than the one in ([12]) which is 
based on the notion of block-coherence f2S\. 

V. Block-Sparse Recovery via P!, 

In this section, we consider the problem of block- sparse 
recovery from the non-convex optimization program 

n 

: min^/(||B[i]c[i]||,) s.t. y = Be. 



Unlike P^^/^g ^^^^ penalizes the norm of the coefficient blocks, 
P[ penalizes the norm of the reconstructed vectors from the 
blocks. Hence, P[ finds a solution that has the minimum 
number of nonzero vectors S[i]c[z] G Si. For a dictionary 
with non-redundant blocks, the solution of Pn ,n has also 
the minimum number of nonzero coefficient blocks. However, 



An intermediate sufficient condition is given by \J^^^^ + u^-i < 
using the fact that Ck ^ Uk < kjis- 



3 

1-e, 
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this does not necessarily hold for a dictionary with redundant 
blocks since a nonzero c[i] in the nullspace of the non- 
contributing blocks (S[i]c[z] = 0) does not affect either the 
value of the cost function or the equality constraint in . 

Despite the above argument, we consider P^^^^^ and its 
convex relaxation for block-sparse recovery in generic dictio- 
naries with non-redundant or redundant blocks, because one 
can simply set to zero the nonzero blocks c*[z] for which 
S[z]c*[z] is zero |j An other reason, that we explain in more 
details in Section [VlIJ comes from the fact that in some tasks 
such as classification, we are mainly concerned with finding 
the contributing blocks rather than being concerned with the 
representation itself. 

Since P^ is NP-hard, we consider its ii relaxation 



( [35] ) and use the triangle inequality for ||S[z](c*[z] + c*[i])||g 
in 

(==4>) We prove this using contradiction. Assume that there 
exist A/e and x in the intersection of ^ieAk'Si and ^ieA^Si 
for which the condition in ([541) does not hold, i.e.. 



\\B[i]c*[i 



II. <E 

ieAk 



(56) 



miny^ \\B\i]c\i]\\q s.t. y = Be, 



=1 



which is a convex program for g > 1. Our approach to 
guarantee the equivalence of P^ and P^ is similar to 
the one in the previous section. Let A/^ be a set of k indices 
from {1, . . . , n} and A^ be the set of the remaining indices. 
For a nonzero signal x in the intersection of 0iGAfc*^i and 
^i^\^Si, let the minimum iq/ii-norm coefficient vector when 
we choose only the blocks of B indexed by A^ be 



Thus, a solution of x = is given by c* that is not 
/c-block-sparse and whose linear transformation by B has 
a iq /ii-norm that is smaller than or equal to the norm 
of the transformation by B of any /c -block- sparse solution, 
contradicting the equivalence of P£^/£^ and P'^^i^^- ■ 

Next, we propose sufficient conditions that incorporate the 
inter-block and intra-block parameters of the dictionary and 
can be efficiently checked. Before that we need to introduce 
the following notation. 

Definition 6: Consider a dictionary B with blocks B[i] G 
X rrii Define e'q as the smallest constant such that for every 
i and c[i] we have 

(l-e;)||B[i]c[i]||^ <||B[i]c[i]||i <(l+e;)||B[i]c[i]||^. (57) 

Note that e'^ characterizes the relation between the Iq and ^2 
norms of vectors in and does not depend on whether the 



c* = argmin ^ \\B[i]c[i] \\q s. t. x = ^ B[i]c[i]. (52) blocks are non-redundant or redundant. In addition, for q 



ieAk 



ieAk 



Also, let the minimum iq /ii-norm coefficient vector when we 
choose the blocks of B indexed by A^ be 

c* = argmin ^ II S[z]c[i] II g s.t. x= ^ B[z]c[i]. (53) 



We have the following result. 

Theorem 2: For all signals that have a unique /c-block- 
sparse representation in B, the solution of the optimization 
program P'^ is equivalent to that of P'^ if and only if 



we have 63 = 0. 

Proposition 4: For signals that have a unique /c -block- 
sparse representation in B, the solution of the optimization 
program is equivalent to that of P^^/i^, if 



Ck + C/c-1 < 



1 



1 



(58) 



VA/e,VxG i^ieAkSi) n ( 



ieA^S, 



ieAk 



ieA. 



X 7^ 
\B[i]cmU^ (54) 



Proof: The proof is provided in the Appendix. ■ 

The following corollary derives stronger yet simpler to check 
sufficient conditions for block-sparse recovery using P'^ . 

Corollary 3: For signals that have a unique /c-block-sparse 
representation in B, the solution of the optimization program 
^i/i^ is equivalent to that of P;^/^^, if |5| 



Proof: {^=) Let y he 3. signal that lives in the subspace 
^ieAk^i- Denote by c* the solution of the optimization 
program P^ If for at most k blocks of c* we have 
S[z]c*[z] 7^ 0, then by the uniqueness assumption, these 
blocks of c* are indexed by A^. For the sake of contradiction, 
assume that ^[i]c*[i] ^ for some i G A^. Define 



{2k - l)fis < 



1 



(59) 



Proof: The result follows from Proposition |4] by using the 
fact that C/c ^ k/j^s from Lemma [T] ■ 

Unlike the conditions for the equivalence between Pi^/£-^ 
and Piq/iQ, which depend on whether the blocks are non- 
redundant or redundant, the conditions for the equivalence 



A 

X = 2/ 



ieAk 



B [{] c* \i]=^B [{] c* \i] . (55) between P^ 



and P^ do not depend on the redundancy 



of the blocks. In addition, since eo 



The remaining steps of the proof are analogous to the proof of 
Theorem [1] except that we replace ||c*[i]||g by ||S[i]c*[z] ||q in 

^For noisy data, this can be modified by setting to zero the blocks c* [i] 
for which ||S[z]c* [z] ||g is smaller than a threshold. In practice, to prevent 
overfitting, one has to add a small regularization on the coefficients to the 
optimization program. 



equivalence between P^' 
subspace coherence reduces to 



0, the condition for the 
based on the mutual 



(2/c-l)/i5 < 1, 



(60) 



-'An intermediate sufficient condition is given by + < . / 

using the fact that Cfc < '^k < ^Ms- 
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and the condition based on the cumulative subspace coherence 
reduces to 

a + a-i < 1. (61) 



Remark 4: Note that the sufficient conditions in ( |6Q| ) and 
are weaker than the sufficient conditions in ( [5Q| and 
respectively. While we can not assert the superiority of 
over P^2/^i' since the conditions are only sufficient 



P' 

not necessary, as we will show in the experimental results 
is in general more successful than Pt^jt-y for block- 
sparse recovery. 

Remark 5: Under the uniqueness assumption, both noncon- 
vex programs Pn^ji^ and P[ find the unique blocks A/^ and 
the vectors {s^ G Si\i^p^^ for which y = Xl^^Afc Thus, 
when the conditions for the success of the convex programs 
Piq/i^ and P^ hold, their optimal solutions correspond to 
A/c and {s^ G 5i}ieAfc- For non-redundant blocks, this implies 
that the optimal coefficient vectors found by Pn^jn-^ and P[ 
are the same and equal to the true solution. 



VI. Correcting Sparse Outlying Entries 

In real- world problems, observed signals might be corrupted 
by errors fT4l . 1381 . hence might not perfectly lie in the 
range-space of a few blocks of the dictionary |15|. A case of 
interest, which also happens in practice, is when the observed 
signal is corrupted with an error that has a few outlying 
entries. For example, in the face recognition problem, a face 
image might be corrupted because of occlusions |14|, or in 
the motion segmentation problem, some of the entries of 
feature trajectories might be corrupted due to objects partially 
occluding each other or malfunctioning of the tracker 1331 . 
ITSl . In such cases, the observed signal y can be modeled as a 
superposition of a pure signal y^ and a corruption term e of the 
form y = y^ -\- e, where y^ has a block sparse representation 
in the dictionary B and e has a few large nonzero entries. 
Thus, y can be written as 



y = yo^e 



= Bc^e=[B I] 



(62) 



where I denotes the identity matrix. Note that the new 
dictionary [B l] has still a block structure whose blocks 
correspond to the blocks of B and the atoms of I. Thus, in 
this new dictionary, y has a block- sparse representation with 
a few blocks corresponding to B and a few blocks/atoms 
corresponding to I. Assuming that the sufficient conditions 
of the previous sections hold for the dictionary [B l] , we 
can recover a block- sparse representation of a corrupted signal 
using the convex optimization program Pi^/i-^^ as 



P,,/,,: min^||c[i]||, + ||e||i s.t. y = [B I] 



i=l 



or using the convex optimization program P^^^^^ as 

n 

P^j,: min^||B[i]c[i]||,+||e||i s.t. y=[B l] 



i=l 



, (63) 



■ (64) 



Here, we used the fact that the blocks of I are of length one, 
i.e., e[i] e R. Thus, Ef=i l|eWI|, = EZi \e[i]\ = l|e||i. 

As a result, this paper not only proposes two classes of 
convex programs that can be used to deal with block- sparse 
recovery of corrupted signals, but also provides theoretical 
guarantees under which one can successfully recover the 
block-sparse representation of a corrupted signal and eliminate 
the error from the signal]^ 

VII. Experimental Results 

In this section, we evaluate the performance of the two 
classes of convex programs for recovering block-sparse rep- 
resentations of signals. We evaluate the performance of the 
convex programs through synthetic experiments as well as real 
experiments on the face recognition problem. 

A. Synthetic Experiments 

We consider the problem of finding block- sparse representa- 
tions of signals in dictionaries whose atoms are drawn from a 
union of disjoint subspaces. We investigate the performance of 
the two classes of convex programs for various block- sparsity 
levels. 

For simplicity, we assume that all the subspaces have the 
same dimension d and that the blocks have the same length 
m. First, we generate random bases Ai G R^^^ for n 
disjoint subspaces {Si}'^^^ in by orthonormalizing i.i.d. 
Gaussian matrices where the elements of each matrix are 
drawn independently from the standard Gaussian distribution|^ 
Next, using the subspace bases, we draw m G {(i, 2d} random 
vectors in each subspace to form blocks B[i] G R^^^. For 
a fixed block-sprsity level k, we generate a signal y G R^ 
using a random /c -block- sparse vector G R^^ where the k 
nonzero blocks, A^, are chosen uniformly at random from the 
n blocks and the coefficients in the nonzero blocks are i.i.d. 
and drawn from the standard Gaussian distribution. 

For each class of the convex programs Pi^/i-^ and P^ 
with q G {1,2,(X)}, we measure the following errors. The 
reconstruction error measures how well a signal y can be 
reconstructed from the blocks of the optimal solution c* 
corresponding to the correct support and is defined as 

\\y-E.eA,mc*[i\h 



reconstruction error : 



(65) 



Ideally, if an optimization algorithm is successful in recovering 
the correct vector in each subspace, i.e., S[z]c*[z] = S[i]c^[z] 
for all i, then the reconstruction error is zero. As we expect 
that the contribution of the blocks corresponding to A^ to the 
reconstruction of the given signal be zero, i.e., S[z]c*[z] = 0, 
we measure the block-contribution error as 

E.eA. \\B[i]c*m2 



block contribution error = 1 



Er=illsWc*Mll2 



e[o,i]. 

(66) 



^Note that the result can be easily generalized to the case where the error 
e has a sparse representation in a dictionary G instead of / by considering 
the dictionary [B G] in ( |62) . 

^In order to ensure that the generated bases correspond to disjoint sub- 
spaces, we check that each pair of bases must be full column-rank. 
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Non-Redundant Dictionary 



Non-Redundant Dictionary 



Non-Redundant Dictionary 




Block sparsity level (k) 



Block sparsity level (k) 



Block sparsity level (k) 



Fig. 3. Errors of 
recovery error (ri^ 



the convex programs on synthetic data with n = 40, D = 100. Reconstruction error (left), block-contribution error (middle) and coefficient 
;ht) for non-redundant blocks with m = d = 4. 



Redundant Dictionary 




Redundant Dictionary 



Block sparsity level (k) 




Block sparsity level (k) 



Fig. 4. Errors of the convex programs on synthetic data with n = 40, D = 100. Reconstruction error (left) and block- contribution error (right) for redundant 
blocks with m = 2d = 8. 



The error is equal to zero when all contributing blocks 
correspond to and it is equal to one when all contributing 
blocks correspond to A^. For non-redundant blocks, since 
is the unique /c -block- sparse vector such that y = Bc^, we 
can also measure the coefficient recovery error as 

coemcient recovery error = — r— . (67) 

\\c h 

We generate Li = 200 different sets of n = 40 blocks in 
]^ioo g^^j^ ^ blocks we generate L2 = 100 dif- 

ferent block- sparse signals. For a fixed block-sparsity level, we 
compute the average of the above errors for each optimization 
program over L = Li x L2 = 20, 000 trials]^ 

Figure [3] shows the average errors for various block-sparsity 
levels for non-redundant blocks where m = d = 4. As the 
results show, for a fixed value of q, obtains lower 

reconstruction, block-contribution, and coefficient recovery 
errors than Pi^/i^ for all block-sparsity levels. Moreover, 
while the performance of Pe^/e^ significantly degrades for 
block-sparsity levels greater than 3, P^ maintains a high 
performance for a wider range of block-sparsity levels. 

Figure |4] shows the average errors for various block-sparsity 
levels for redundant blocks with m = 2d = 8. Similar 
to the previous case, for a fixed q, P^ has a higher 
performance than Pi^/i^ for all block-sparsity levels. Note that 
redundancy in the blocks improves the performance of Pe^/e^- 

Hn order to solve the convex programs, w e use the CVX package which 
can be downloaded from |http://cvxr.com/cvx| 



Specifically, compared to the case of non-redundant blocks, 
the performance of Pi^/i^ degrades at higher sparsity levels. 

An interesting observation from the results of Figures |3] and 
|4]is that for each class of convex programs, the case of = 00 
either has a lower performance or degrades at lower block- 
sparsity levels than g = 1, 2. In addition, the case of = 2 in 
general performs better than q = 1. 

B. Face Recognition 

In this part, we evaluate the performance of the block- 
sparse recovery algorithms in the real problem of automatic 
face recognition. Assume we are given a collection of mn 
face images of n subjects acquired under the same pose and 
varying illumination. Under the Lambertian assumption, 1341 
shows that the face images of each subject live close to a linear 
subspace of dimension d = ^. Thus, the collection of faces 
of different subjects live close to a union of 9-dimensional 
subspaces. Let hij G denote the j-th training image for the 
z-th subject converted into a vector. We denote the collection 
of m faces for the i-\h subject as 

B[i]^[hii 6,2 ••• 6^m]GR^><"^. (68) 

Thus, the dictionary B consists of the training images of the 
n subjects. In this dictionary, a new face vector, y G M^, 
which belongs to the i-\h subject, can be written as a linear 
combination of face vectors from the i-th block. However, 
in reality, a face image is corrupted with cast shadows and 
specularities. In other words, the columns of B are corrupted 
by errors and do not perfectly lie in a low-dimensional 
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Fig. 5. Top: sample face images from four subjects in the Extended Yale B dataset. Bottom: classification rates for the convex programs on the Extended Yale B 
database with n = 38 and D = 132 as a function of the number of training data in each class. Left: using eigen-faces. Middle: using random projections. 
Right: using down-sampling. 



subspace. Thus, in the optimization programs, instead of the 
exact equaUty constraint y = Be, wq use the constraint 
— Sc||2 < (5 1^ Following |[T4ll . we can find the subject 
to which y belongs from 



nearest subspace (NS) method L42J as well as the Linear SVM 
classifier (431. 



identity (y) = argmin ||y — S[i]c*[z] ||2. 



(69) 



We evaluate the performance of each one of the above 
optimization programs on the Extended Yale B dataset 1391 , 
a few images of which are shown in Figure [5] The dataset 
consists of 2,414 cropped frontal face images of n = 38 
individuals. For each subject, there are approximately 64 face 
images of size 192 x 168 = 32, 256, which are captured 
under various laboratory-controlled lighting conditions. Since 
the dimension of the original face vectors is very large, we 
reduce the dimension of the data using the following methods: 

- We use the eigenfaces approach |40| by projecting the face 
vectors to the first D principal components of the training data 
CO variance matrix. 

- We multiply the face vectors by a random projection matrix 
^ G 1^^x32,256^ which has i.i.d. entries drawn from a zero 
mean Gaussian distribution with variance ^ 1411 . lfT4l . 

- We down- sample the images by a factor r such that the 
dimension of the down-sampled face vectors is D. 

In the experiments, we set D = 132. For each subject, we 
randomly select m G {9, 18,25,32} training images, to form 
the blocks B[i] G R^^m ^j^^ remaining images for 

testing. For every test image, we solve each class of the convex 
programs for G {1, 2} and determine the identity of the test 
image using (|69|)p^ We compute the classification rate as the 
average number of correctly classified test images for which 
the recovered identity matches the ground- truth. We repeat this 
experiment L = 20 times for random choices of m training 
data for each subject and compute the mean classification 
rate among all the trials. We compare our results with the 

^In all the experiments of this section, we set 6 = 0.05. 
Similar to the synthetic experiments, the case of g = oo has lower 
performance than other values of q, hence we only report the results for 
g = 1,2. 



The recognition results for three dimensionality reduction 
methods are shown in Figure |5] As the results show, the 
NS and SVM methods have lower performance than methods 
based on sparse representation. This comes from the fact that 
the linear SVM assumes that the data in different classes are 
linearly separable while the face images have a multi- subspace 
structure, hence are not necessarily separable by a hyperplane. 
In the case of the NS method, subspaces associated to different 
classes are close to each other, i.e., have a small principal 
angle 1201 . Since the test images are corrupted by errors, they 
can be close to the intersection of several subspaces, resulting 
in incorrect recognition. In addition, using the underlying sub- 
spaces ignores the distribution of the data inside the subspaces 
as opposed to the sparsity-based methods that directly use 
the training data. On the other hand, for a fixed value of q, 
the convex program almost always outperforms Pe^/e^- 

While the performances of different methods are close for a 
large number of training data in each class, the difference 
in their performances becomes evident when the number of 
data in each class decreases. More specifically, while the 
performance of all the algorithms degrade by decreasing the 
number of data in each class, the convex programs P^ are 
more robust to decreasing the number of training data. In other 
words, when the number of training data in each class is as 
small as the dimension of the face subspace, i.e., m = d = 9, 
P'l^li^ has 5% to 10% higher recognition rate than Pi^/i^ . This 
result is similar to the result of synthetic experiments, where 
we showed that the gap between the performance of the two 
classes of convex programs is wider for non-redundant blocks 
than redundant blocks. It is also important to note that the 
results are independent of the choice of the features, i.e., they 
follow the same pattern for the three types of features as shown 
in Figure jsj In all of them ^^'^/^i ^^i/^i ^^^ieve the best 
recognition results (see ifTSl for experimental results on data 
with corruption, occlusion, and disguise). 
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VIII. Conclusions 

We considered the problem of block-sparse recovery using 
two classes of convex programs, P^^/i-^ and P'^ and, under 
a unified framework, we analyzed the recovery performance 
of each class of convex programs for both non-redundant 
and redundant blocks. Interesting avenues of further research 
include analysis of the stability of each convex program 
family in the presence of noise in the measured signal as 
well as generalizing our results to find recovery guarantees 
for mixed ^^/^p-norm algorithms B4ll . Investigating necessary 
and sufficient conditions based on the statistical analysis of the 
projected polytopes |45| via the mixed ^^/^i-norms would 
also be of great importance. In particular, a geometrical study 
of the convex programs Pi^/i^ and P'^ will help in a better 
understanding of the differences in their block-sparse recovery 
performance. For dictionaries whose blocks are drawn from 
certain distributions, the probabilistic analysis of meeting the 
conditions of Propositions |3] and |4] as well as Corollaries [2] and 
[3] will be the subject of future work. Finally, while there has 
been a lot of work for fast and efficiently solving the P^^/i^ 
convex program family [461, [47 1, extending such results to 
the P'^ family and its unconstrained Lasso-type variations 
is an interesting open avenue for further research. 
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Appendix 
A. Proof of Proposition [7] 

Let Afe = {jT, . . . , jfc} and ^ A/, be the set of indices for 
which Cfc is obtained, i.e., 

k 

C/c = max max [i{Si,S^) = a{Si* ,Sj*). (70) 

Afc i^Ak ^ ^ 

Denoting the sorted subspace coherences among all pairs of 
different subspaces by /i^ = /ii > /i2 > • • • , we have 

k k 

Ck = < Uk = ^/^i < kfis, (71) 



^=1 



^=1 



which proves the desired result. 



B. Proof of Proposition [2] 

We prove this result using contradiction. 
{=>) Assume there exists a 2 /c -block- sparse vector c 7^ 
such that Bc = 0. We can write = [cj cj] where Ci 
and C2 are /c -block-sparse vectors. So, we have 



Bc=\Bi B2 







y^BiCi = -B2C2. (72) 



Thus, there exists a vector y that has two /c-block- sparse 
representations in B using different sets of blocks. This 
contradicts the uniqueness assumption of the proposition. 
(<^) Assume there exists a vector y that has two dif- 
ferent /c-block- sparse representations using {{ii}^ {sii}) 7^ 



Si e 



({ij}, {s-^}). Since for each block of B, we have 
rank(^[i]) = rank(S[z]), there exist Ci and C2 such that y = 
Bci = Bc2^ where Ci and C2 are different /c-block-sparse 
with the indices of their nonzero blocks being {ii} and 
respectively. Also, B[ii]ci[ii] = Si^ and B[ii]c2[ii] = s[^. 
Thus, we have B {C1—C2) = that contradicts the assumption 
of the proposition since Ci — C2 is a 2/c-block-sparse vector. 

C. Proof of Corollary [7] 

We prove the result using contradiction. 
(=4>) Assume there exists Bn G Mr {An) such that 
rank(S^) < 2k. So, there exists a 2/c-sparse vector = 
c^] such that BnCn = Yl7=i ^n^i = 0' whcrc 
is the i-th column of B^- For each full column- 
rank submatrix of B [i] , denoted by B [i] G ^ , there exists 
a unique c[i] such that ^[z]c[z] = c^s^. Thus, Be = 0, 
where B is defined in ( [27] ) and = [c[l]^ ••• c[n]^] 
is a 2 /c-block- sparse vector. This, contradicts the uniqueness 
assumption using Proposition [2] 

(<^) Now, assume there exists a signal y that has two 
different A: -block- sparse representations in B. From Propo- 
sition J2j there exists a 2 /c -block- sparse vector c ^ such 
that Be = 0. We can rewrite 5[z]c[z] = c^Si, where 
Si G Wr,i- Thus, we have Be = BnCn = 0, where 
Bn=[si Sn]eMr{An)^ndeZ^[ci ••• <]isa 

2/c-sparse vector. This implies rank(S^) < 2k that contradicts 
the assumption. 

D. Proof of Lemma ^ 

The idea of the proof follows the approach of Theorem 3.5 
in El. Let Ek = [ci, ••• e^J e Ma{Ak) and = 
[e^^^^ • • • Si^] where He^^ ||2 < VI + for every ii e A^. 
Using matrix norm properties, we have 

\\{ElE,)-'ElE-J,^, < \\{EjE,)-%^,\\ElE-J,^,. 

(73) 

We can write eJe^ = Ik D, where 

1 ... 



D 



Bi 



1 



(74) 



Since Ek G Bq,(A/c), for any column of E^, we have ||ei||2 < 
1 + a. Also, for any two columns and ej of we have 

\ejej\ < ||e,||2||e,||2/i(5„5,) < (1 + a) /i(5„ 5,). (75) 

Thus, we can write 



\\D\\i^i<a^{l^a)Ck-i- 

If < 1, we can write (EjEk)-^ = {Ij, 

Xli^o {—D)^/k\ from which we obtain 



(76) 



i=0 



< 



1 



[a + (l + a)a- 



(77) 
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On the other hand, EJ has the following form 



el Bi 



ejci 



(78) 



Since for each column of the matrix we have ||ei||2 < 
1 -\- a and for each column Cj of the matrix E^ we have 
IWjWl < 1 + we obtain 

< + + a. (79) 

Finally, substituting ( [77] ) and ( |79| ) into ( [73] ), we get 

||(£;j£;,)-^£;^£;^||i,i < \\{ElE,)-'h^,\\ElE-j,^r 
< 



V(l + a)(l + /3)a 
1- [a + (l + a)a-i 



(80) 



E. Proof of Proposition [?] 

Fix a set = {^i, • • • , ^fe} of /c indices from {1, . . . , n} 
and denote by = {i/c+i, . . . , in} the set of the remaining 
indices. For a signal x in the intersection of SiGA^^^i and 
©iGA^*5i, let c* be the solution of the optimization program 
([52]). We can write 



ieAk 



B\i]c*\i] = Bkak, 



(81) 



where Bk = [sii ... Si^] and at = [aii ... a^^] are 
defined as follows. For every i G A/^, if c*[z] ^ and 
S[z]c*[z] 7^0, define 



19- 



(82) 



' ||B[i]c*[i]||,' 

Otherwise, let Si be an arbitrary vector in Si of unit Euclidean 
norm and = 0. According to Definition [6] we have B^ ^ 

Be; (A,). 

Now, let c* be the solution of the optimization program 
([53]). We can write 



x = ^ B\i]cy] =B^a^, 



(83) 



ieAt 



where B^ ^ [si,^^ ... s^J and = [a^,^, . .^^ a^J^ 
are defined in the following way. For every i G A^, if c* [i] ^ 
and S[z]c*[z] ^ 0, define 

"^-wSrF' a,4||BHc*W||,. (84) 

Otherwise, let Si be an arbitrary vector in Si of unit Euclidean 
norm and = 0. Note that from Definition [6] we have B^ G 

Since B]^ ^ Be/^(A/e), assuming e^G [0, 1), the matrix B]^ 
is full column-rank from Corollary [l] Hence, we have a/e = 
{B^ Bj^)~^B^y and consequently, 

llafelli = ||(B^Bfe)-iB^x||i. (85) 

Substituting x from ([83l> in the above equation, we obtain 



ctfc 1 



< ||(B^Bfe)-iB^B^||i,i||aj||i. (86) 



Using Lemma |2] with a = e'^ and (3 = e'^, we have 



(87) 



Thus, if the right hand side of the above equation is strictly 
less than one, i.e., the sufficient condition of the proposition 
is satisfied, then from ( [86] ) we have \\ak\\i < Finally, 
using the definitions of and a^, we obtain 

^ \\B[i]c*m, = M < iia^iii = E mrm,, 

(88) 

which implies that the condition of Theorem [2] is satisfied. 
Thus, P;^/^^ is equivalent to P;^/^^. 
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